Goto

Collaborating Authors

 unknown number


Online Bayesian Moment Matching for Topic Modeling with Unknown Number of Topics

Neural Information Processing Systems

Latent Dirichlet Allocation (LDA) is a very popular model for topic modeling as well as many other problems with latent groups. It is both simple and effective. When the number of topics (or latent groups) is unknown, the Hierarchical Dirichlet Process (HDP) provides an elegant non-parametric extension; however, it is a complex model and it is difficult to incorporate prior knowledge since the distribution over topics is implicit. We propose two new models that extend LDA in a simple and intuitive fashion by directly expressing a distribution over the number of topics. We also propose a new online Bayesian moment matching technique to learn the parameters and the number of topics of those models based on streaming data. The approach achieves higher log-likelihood than batch and online HDP with fixed hyperparameters on several corpora.


Poisson Process Jumping between an Unknown Number of Rates: Application to Neural Spike Data

Neural Information Processing Systems

We introduce a model where the rate of an inhomogeneous Poisson process is modified by a Chinese restaurant process. Applying a MCMC sampler to this model allows us to do posterior Bayesian inference about the number of states in Poisson-like data. Our sampler is shown to get accurate results for synthetic data and we apply it to V1 neuron spike data to find discrete firing rate states depending on the orientation of a stimulus.


Apple's Best New iOS 26 Feature Has Been on Pixel Phones for Years

WIRED

Apple's Best New iOS 26 Feature Has Been on Pixel Phones for Years The iPhone's new software screens your calls using machine intelligence. Neat, but Google had the feature first--just like so many other features that rely on AI to work. Call Screening on an iPhone. Ever since I was a child, I've despised answering the phone when an unknown number calls. Who could be on the other end?


Fair Bayesian Model-Based Clustering

Lee, Jihu, Kim, Kunwoong, Kim, Yongdai

arXiv.org Machine Learning

Fair clustering has become a socially significant task with the advancement of machine learning technologies and the growing demand for trustworthy AI. Group fairness ensures that the proportions of each sensitive group are similar in all clusters. Most existing group-fair clustering methods are based on the $K$-means clustering and thus require the distance between instances and the number of clusters to be given in advance. To resolve this limitation, we propose a fair Bayesian model-based clustering called Fair Bayesian Clustering (FBC). We develop a specially designed prior which puts its mass only on fair clusters, and implement an efficient MCMC algorithm. Advantages of FBC are that it can infer the number of clusters and can be applied to any data type as long as the likelihood is defined (e.g., categorical data). Experiments on real-world datasets show that FBC (i) reasonably infers the number of clusters, (ii) achieves a competitive utility-fairness trade-off compared to existing fair clustering methods, and (iii) performs well on categorical data.


Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters

Leiber, Collin, Strauß, Niklas, Schubert, Matthias, Seidl, Thomas

arXiv.org Artificial Intelligence

Finding meaningful groups, i.e., clusters, in high-dimensional data such as images or texts without labeled data at hand is an important challenge in data mining. In recent years, deep clustering methods have achieved remarkable results in these tasks. However, most of these methods require the user to specify the number of clusters in advance. This is a major limitation since the number of clusters is typically unknown if labeled data is unavailable. Thus, an area of research has emerged that addresses this problem. Most of these approaches estimate the number of clusters separated from the clustering process. This results in a strong dependency of the clustering result on the quality of the initial embedding. Other approaches are tailored to specific clustering processes, making them hard to adapt to other scenarios. In this paper, we propose UNSEEN, a general framework that, starting from a given upper bound, is able to estimate the number of clusters. To the best of our knowledge, it is the first method that can be easily combined with various deep clustering algorithms. We demonstrate the applicability of our approach by combining UNSEEN with the popular deep clustering algorithms DCN, DEC, and DKM and verify its effectiveness through an extensive experimental evaluation on several image and tabular datasets. Moreover, we perform numerous ablations to analyze our approach and show the importance of its components. The code is available at: https://github.com/collinleiber/UNSEEN


3D exploration-based search for multiple targets using a UAV

Yousuf, Bilal, Lendek, Zsofia, Busoniu, Lucian

arXiv.org Artificial Intelligence

Consider an unmanned aerial vehicle (UAV) that searches for an unknown number of targets at unknown positions in 3D space. A particle filter uses imperfect measurements about the targets to update an intensity function that represents the expected number of targets. We propose a receding-horizon planner that selects the next UAV position by maximizing a joint, exploration and target-refinement objective. Confidently localized targets are saved and removed from consideration. A nonlinear controller with an obstacle-avoidance component is used to reach the desired waypoints. We demonstrate the performance of our approach through a series of simulations, as well as in real-robot experiments with a Parrot Mambo drone that searches for targets from a constant altitude. The proposed planner works better than a lawnmower and a target-refinement-only method.


Source Separation of Unknown Numbers of Single-Channel Underwater Acoustic Signals Based on Autoencoders

Sun, Qinggang, Wang, Kejun

arXiv.org Artificial Intelligence

Due to the influences of ocean environment noise and sea water channels, the separation of underwater acoustic signals is a challenging problem. Some studies have researched the separation of underwater signals by separating the different components of signals with different characteristics, such as spatial orientation information and category differences, in a certain signal transformation domain. Some methods separate signals directly on the feature domain based on expert knowledge [1-3]. The wrap transform was used to separate dispersive time-frequency components in [1]. A depth-based method was proposed in [2], where the modified Fourier transformation of the output power of a plane-wave beamformer was used to separate the signals obtained from a vertical line array. In [3], rigid and elastic acoustic scattering components of underwater target echoes were separated in the fractional Fourier transform domain based on a target echo highlight model. Most other algorithms rely on blind signal separation (BSS) methods [4-10]. In [4], the frequency components of the Detection of Envelope Modulation on Noise (DEMON) spectrum were used to separate signals in different directions via independent component analysis (ICA). According to the main frequency bands of different signals in a linear superposition signal, in [5], bandpass filters were used first, and then eigenvalue decomposition was employed for separation purposes [6] and [7] used the Sawada algorithm and ideal binary masking to separate artificially mixed whale songs.



Matching Map Recovery with an Unknown Number of Outliers

Minasyan, Arshak, Galstyan, Tigran, Hunanyan, Sona, Dalalyan, Arnak

arXiv.org Artificial Intelligence

We consider the problem of finding the matching map between two sets of $d$-dimensional noisy feature-vectors. The distinctive feature of our setting is that we do not assume that all the vectors of the first set have their corresponding vector in the second set. If $n$ and $m$ are the sizes of these two sets, we assume that the matching map that should be recovered is defined on a subset of unknown cardinality $k^*\le \min(n,m)$. We show that, in the high-dimensional setting, if the signal-to-noise ratio is larger than $5(d\log(4nm/\alpha))^{1/4}$, then the true matching map can be recovered with probability $1-\alpha$. Interestingly, this threshold does not depend on $k^*$ and is the same as the one obtained in prior work in the case of $k = \min(n,m)$. The procedure for which the aforementioned property is proved is obtained by a data-driven selection among candidate mappings $\{\hat\pi_k:k\in[\min(n,m)]\}$. Each $\hat\pi_k$ minimizes the sum of squares of distances between two sets of size $k$. The resulting optimization problem can be formulated as a minimum-cost flow problem, and thus solved efficiently. Finally, we report the results of numerical experiments on both synthetic and real-world data that illustrate our theoretical results and provide further insight into the properties of the algorithms studied in this work.


Data Scientist's Guide to Efficient Coding in Python - KDnuggets

#artificialintelligence

In this article, I wanted to share a few tips for writing cleaner codes that I have absorbed in the last year -- mainly from pair programming. Generally speaking, including them as part of my everyday coding routine has helped me generate supreme quality Python scripts, that are easily maintainable and scalable over time. Ever thought why senior developer's code look so much better in comparison to a junior developer. Read on to bridge the gap…. Rather than giving generic examples on how to use these techniques, I will be giving real-life coding scenarios where I have actually used them!